Information Gain

$H(X)$ is computed in terms of the classes of $X$. $$ \begin{aligned} G(X, A) &= I(X; A) = H(X) - H(X|A) = H(X) - \sum_{a \in A} H(X|a) \\ &= \mathbb{E}_A \left[ D(p_{X|A} || p_X)\right] \end{aligned} $$ So we say that the information gain is defined in terms of both KL-Divergence & Mutual Information.